A proposition of a robust system for historical document images indexation
نویسندگان
چکیده
Characterizing noisy or ancient documents is a challenging problem up to now. Many techniques have been done in order to effectuate feature extraction and image indexation for such documents. Global approaches are in general less robust and exact than local approaches. That’s why, we propose in this paper, a hybrid system based on global approach (fractal dimension), and a local one, based on SIFT descriptor. The Scale Invariant Feature Transform seems to do well with our application since it is rotation invariant and relatively robust to changing illumination. In the first step the calculation of fractal dimension is applied to images, in order to eliminate images which have distant features than image request characteristics. Next, the SIFT is applied to show which images match well the request. However, the average matching time using the hybrid approach is better than “fractal dimension” and “SIFT descriptor” techniques, if they are used alone.
منابع مشابه
Ancient Printed Documents Indexation: A New Approach
Based on the study of the specificity of historical printed books and on the main error sources of classical methods of page layout analysis, this paper presents a new way to achieve an indexation of ancient printed documents. We have developed an approach based on the extraction and the quantification of the various orientations that are present in printed document images. The documents are in...
متن کاملLearning Document Image Features With SqueezeNet Convolutional Neural Network
The classification of various document images is considered an important step towards building a modern digital library or office automation system. Convolutional Neural Network (CNN) classifiers trained with backpropagation are considered to be the current state of the art model for this task. However, there are two major drawbacks for these classifiers: the huge computational power demand for...
متن کاملOn the Influence of Word Representations for Handwritten Word Spotting in Historical Documents
Word spotting is the process of retrieving all instances of a queried keyword from a digital library of document images. In this paper we evaluate the performance of different word descriptors to assess the advantages and disadvantages of statistical and structural models in a framework of query-by-example word spotting in historical documents. We compare four word representation models, namely...
متن کاملRobust Potato Color Image Segmentation using Adaptive Fuzzy Inference System
Potato image segmentation is an important part of image-based potato defect detection. This paper presents a robust potato color image segmentation through a combination of a fuzzy rule based system, an image thresholding based on Genetic Algorithm (GA) optimization and morphological operators. The proposed potato color image segmentation is robust against variation of background, distance and ...
متن کاملDocument Analysis And Classification Based On Passing Window
In this paper we present Document analysis and classification system to segment and classify contents of Arabic document images. This system includes preprocessing, document segmentation, feature extraction and document classification. A document image is enhanced in the preprocessing by removing noise, binarization, and detecting and correcting image skew. In document segmentation, an algorith...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- CoRR
دوره abs/1308.6319 شماره
صفحات -
تاریخ انتشار 2010